Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers
نویسندگان
چکیده
BACKGROUND Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. OBJECTIVE In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. METHODS The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. RESULTS In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). CONCLUSIONS The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.
منابع مشابه
Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملچند رویکرد برخورد با مقادیر گمشده متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی بالینی
Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...
متن کاملTutorial Review: Simulation of Oscillating Chemical Reactions Using Microsoft Excel Macros
Oscillating reactions are one of the most interesting topics in chemistry and analytical chemistry. Fluctuations in concentrations of one the reacting species (usually a reaction intermediate) create an oscillating chemical reaction. In oscillating systems, the reaction is far from thermodynamic equilibrium. In these systems, at least one autocatalytic step is required. Developing an instinctiv...
متن کاملThe Development of Maximum Likelihood Estimation Approaches for Adaptive Estimation of Free Speed and Critical Density in Vehicle Freeways
The performance of many traffic control strategies depends on how much the traffic flow models have been accurately calibrated. One of the most applicable traffic flow model in traffic control and management is LWR or METANET model. Practically, key parameters in LWR model, including free flow speed and critical density, are parameterized using flow and speed measurements gathered by inductive ...
متن کاملA tutorial on Quasi-experimental designs
A main step in answering a scientific hypothesis in an epidemiological study is deciding which type of study is suitable to be undertaken, considering methodology, practical considerations and budget and time limitations
متن کامل